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hancement, an input signal (s) is fil- 
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is multiplied (M2. M3) by a control- 
lable fraction (cx, Cy) to obtain multi- 
plied signals, and the multiplied sig- 
nals axe added (A2) to the input signal 
(s). According to the invention, the 
controllable fraction (cx, Cy) is gener- 
ated by a non-linear function (HCF. 
VCF). 
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The invention relates to a method and device for sharpness enhancement. 

In scenarios where TV and PC systems merge their functionality, especially in 
the home environment, high-quality displaying of video images in PC architectures becomes a 
5 challenging issue for industries oriented to the consumer market. Since the spatial resolution 
of synthetic images of the PC outperforms that of natural video scenes, the desire for improved 
quality in TV images will increase. The subjective attribute sharpness, which is determinant by 
the human visual system, is one of the most important factors in the perception of image 
quality. Despite many algorithms for 2-D sharpness enhancement have already been proposed, 

10 their effectiveness drops in this context. In fact in the image processing literature, it is often 

implicitly assumed that operators that have been designed for the enhancement of 2-D data can 
be straightforwardly applied to video sequences too. As far as contrast sharpening is 
concerned, this is far from true: conventional 2-D techniques introduce many small artifacts 
and non-homogenuities which are perfectly acceptable in a still picture but become very 

15 visible and annoying in an image sequence. Moreover, the visibility of a defect strongly 
depends on the sequence contents, namely on the amount of details and of motion. 

Moreover, even more attention has to be paid, in the PC environment, to the 
case of decompressed images like MPEG, AVI etc. In such cases, indeed, network and bus 
throughput constraints and limited storage capacity impose the use of compression techniques, 

20 often operating at relatively low bit rates, which cause visible blocking effects in the 

decompressed images. The artificial high frequencies arising along the borders of the blocks 
themselves are usually amplified by sharpness enhancement algorithms, so that the blocking 
artifact would be emphasized. 

The task of image enhancement techniques is often to emphasize the details of a 

25 scene so as to make them more visible to a human viewer or to aid some machine performance 
(e.g. object identification); at the same time, they should reduce noise or, at least, avoid its 
amplification. Algorithms for contrast enhancement are often employed in an interactive 
fashion with the choice of the algorithm and the setting of its parameters being dependent on 
the specific application on hand. 

CONFIRMATION COPY 
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A large number of approaches have been devised to improve the perceived 
quaUty of an image [1]. The histogram equaUzation, a commonly used method, is based on the 
mapping oFthe input gray levels to "achieve a nearry uniform dutp^^ 

2]. However, histogram equalization applied to the entire image has the disadvantage of the 
attenuation of low contrast in the sparsely populated histogram regions. This problem can be 
alleviated by employing local histogram equahzation, which is of high computational 
complexity. 

Another method, called statistical differencing, generates the enhanced image 
by dividing each pixel value by its standard deviation estimated inside a specified window 
centered at the pixel [2]. Thus, the amplitude of a pixel in the image is increased when it 
differs significantly from its neighbors, while it is decreased otherwise. A generalization of the 
statistical difference methods includes the contributions of the pre-selected first-order and 
second-order moments. 

An alternate approach to contrast enhancement is based on modifying the 
magnitude of the Fourier transform of an image while keeping the phase invariant. The 
transform magnitude is normalized to range between 0 and 1 and raised to a power which is a 
number between zero and one [3], An inverse transform of the modified spectrum yields the 
enhanced image. This conceptually simple approach in some cases results in unpleasant 
enhanced images with two types of artifacts: enhanced noise and replication of sharp edges. 
Moreover, this method is of high computational complexity. 

A simple linear operator that can be used to enhance blurred images is Unsharp 
Masking (UM) [1]. The unsharp masking approach exploits a property of the human visual 
system called the Mach band effect. This property describes the visual phenomenon that the 
difference in the perceived brightness of neighboring region depends on the sharpness of the 
transition [1]; as a result, the image sharpness can be improved by introducing more 
pronoimced changes between the image regions. The fundamental idea of UM is to subtract 
fi-om the input signal a low-pass filtered version of the signal itself. The same effect can 
however be obtained by adding to the input signal a processed version of the signal in which 
high-frequency components are enhanced; we shall refer to the latter formulation, 
schematically shown in Fig. 1 . Fig. 1 shows that a linearly high-pass filtered (HPF) version of 
the input signal x(n) is multiplied by a factor X and thereafter added to the input signal x(n) to 
form the output signal y(n).The output of the high-pass filter HPF introduce an emphasis 
which makes the signal variations more sharp. Its effect is exemplified in Fig. 2. In Fig. 2, 
curve a shows the original signal, curve b shows a first order derivative of the input signal. 
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curve c shows the second order derivative of the input signal, and curve d shows the original 
signal minus k times the second order derivative. 

In one dimension, the UM operation, shown by the block diagram in Fig. 1, can 
be represented mathematically as: 

5 y(n) =x(n) +X2(n); 

where y(n) and x(n) denote the enhanced signal and the original signal, respectively, z(n) 
means the sharpening component, and ^ is a positive constant, A commonly used sharpening 
component is the one obtained by a linear high-pass filter which can be, for example, the 
Laplacian operation given by 

1 0 2(n) = 2x(n) - x(n- 1 ) - x(n+ 1 ). 

Even though this method is simple to implement, it suffers from two drawbacks 
that can significantly reduce its benefits. First, the operator introduces an excessive overshoot 
on sharp details. Second, it also enhances the noise and/or digitization effect. The former 
problem comes from the fact that the UM method assigns an emphasis to the high frequency 

15 components of the input, amplifying a part of the spectrum in which the SNR (signal to noise 
ratio) is usually low. On the opposite, wide and abrupt luminance transitions in the input 
image can produce overshoot effects; these are put into further evidence by the human visual 
system through the Mach band effect. 

Several variants of the linear UM technique have been proposed in literature, 

20 trying to reduce the noise amplification. A quite trivial approach consists in substituting a 
bandpass filter for the high-pass one in Fig. 1. This reduces noise effect, but also precludes 
effective detail enhancement in most images. 

In more sophisticated approaches, Lee and Park [4] suggest to use a modified 
Laplacian in the UM scheme. They propose the order statistic (OS) Laplacian; its output is 

25 proportional to the difference between the local average and the median of the pixel in a 
window. They demonstrate that the resulting filter introduces a much smaller noise 
amplification than the conventional UM filter, with comparable edge-enhancement 
characteristics. A different approach is taken in [5]; in fact they replace the Laplacian filter 
with a very simple operator based on a generalization of the so-called Teager's algorithm. An 

30 example of such an operator is the simple quadratic filter given by 
z(n) =x^(n)-x(n-l)x(n + 1). 

It can be shown that this operator approximates the behavior of a local mean- 
weighted high-pass filter, having reduced high-frequency gain in dark images areas. 
According to Weber's law [1], the sensitivity of the human visual system is higher in dark 
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areas; hence the proposed filter yields a smaller output in darker areas and, therefore, reduce 
the perceivable noise. Even though the above quadratic operators do take into account Weber's 
law, their direct use in unsharp masking may still introduce some visible noise depending on 
the enhancement factor (k in Fig. 1) chosen. 

In order to improve the performance of the UM, in [6] the output of the high- 
pass filter is multiplied by a control signal obtained from a quadratic edge sensors: 
z(n) - [x{n.l) - x(n + 1)]' [2x(n) - x(n-l) - x(n + 1)] (11) 
Its pxirpose is to amplify only local luminance changes due to true image details. The first 
factor on the right-hand side of Eq.1.1 is the edge sensor. It is clear that the output of this 
factor will be large only if the difference between x(n-l) and x(n + 1) is large enough, while 
the squaring operation prevents interpreting small luminance variations due to noise as true 
image details. The output of the edge sensor acts as a weight for the signal coming from the 
second factor in Eq. 1 . 1 , which is a simple linear high-pass filter. 

Another nonlinear filter, the Rational UM technique, has been devised [7]: the 
output of the high-pass filter is multiplied by a rational fimction of the local input data: 

. [x(.-i)-x(.fiy „ ^ _ ^ 

^ ^ A:[Jc(w-l)-x(^I^-l)]' +h 

In this way, details having low and medium sharpness are enhanced; on the 
other side, noise amplification is very limited and steep edges, which do not need fiirther 
emphasis, remain almost unaffected. Under a computational viewpoint, this operator maintains 
almost the same simplicity.8 as the original linear UM method. 

A similar approach is also proposed in [8]; the method is similar to the 
conventional unsharp masking structure, however, the enhancement is allowed only in the 
direction of maximal change and the enhancement parameter is computed as a rational 
fiinction like to the one described in [7]. The operator enhances the true details, limits the 
overshoot near sharp edges and attenuates noise in at areas. Moreover it is applied to color 
image enhancement by using an extension of the gradient to multi-valued signal. 

Finally, in [9] the unsharp masking technique is extended with an advanced 
adaptive control that uses the local image content. More precisely, they have concentrated on 
the following properties for adaptive control of the sharpness enhancement: 

• local intensity level and related noise visibility; 

• noise level contained by the signal; 

• local sharpness of the input signal; 
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• aliasing prevention, where alias results from nonlinear processing such as clipping. 

These four properties of the video signal are analyzed locally by separate units 
and the amount of the sharpness enhancement depending on this analysis. 

All the proposed contrast enhancement methods have been designed for the 
5 enhancement of 2-D data and cannot be applied to video sequences. In fact they introduce 
many small artifacts and ndn-homogenuities which are perfectly acceptable in a still picture 
but become very visible and annoying in an image sequence. Moreover, conventional 2-D 
techniques cannot be applied to the enhancement of block-coded image sequences, because 
they emphasize such artifacts that become very visible. The local spectra and bandwidth of 
10 both noise and the signal vary spatially, and the characteristics of the filters need to be locally 
adapted. 

It is, inter alia, an object of the invention to provide an improved sharpness 
enhancement. To this end, the invention provides a sharpness enhancement as defined in the 
15 independent claims. Advantageous embodiments are defined in the dependent claims. 

In a method of sharpness enhancement according to the invention, an input 
signal is filtered to obtain a filtered signal, the filtered signal is multiplied by a controllable 
fraction to obtain multiplied signals, and the multiplied signals are added to the input signal. 
According to the invention, the controllable fraction is generated by a non-linear fimction. 
20 These and other aspects of the invention will be apparent from and elucidated 

with reference to the embodiments described hereinafter. 

In the drawings: 

Fig. 1 shows a linear unsharp masking structure; 
25 Fig. 2 shows typical behavior of the unsharp masking technique; 

Fig. 3 shows a block diagram of an embodiment of a 3-D unsharp masking 

method; 

Fig. 4 shows a block diagram of an embodiment of a spatial filter for use in the 
embodiment of Fig. 3; 

30 Fig. 5 shows a block diagram of an embodiment of a control fimction for use in 

the spatial filter of Fig. 3; 

Fig. 6 shows a plot of a rational fimction; 

Fig. 7 shows a plot of a temporal controlling function; 

Fig. 8 shows a plot of a control fimction Cxb; 



wo 00/42778 6 PCT/EPOO/00351 

Fig. 9 shows a 3x3 window of a non-linear filter; 

Fig. 10 shows a binary vector that indicates a position of coding artifacts; and 
Fig. 1 1 shows an autocorrelation of the binary vector of Fig. 10. 

In this description, we propose a sharpness enhancement technique for video 
applications, taking into account the fact that the images can be blocked. The enhancement is 
accomplished by adding a correction signal to the luminance edges in an Unsharp Masking- 
like way. However, a nonlinear function is used to generate such a correction signal, and both 
spatial and temporal information is used to deal with significant but thin details and with 
blocking artifacts; at the same time, a temporal component of the operator reduces the noise 
amplitude. 

This description is organized as follows. In Chapters 2 and 3, a detailed 
description of our new method for video and block-coded image sequences is presented. In 
Chapter 4, we describe an algorithm for the automatic discrimination between video and 
block-coded image and propose an operator that can be applied for both type of sequences. In 
Chapter 5, our conclusions are given. 

Basically, the enhancement is accomplished by adding overshoot to luminance 
edges in an Unsharp Masking-like way. However, the optimal amount of overshoot added for 
a high image quality depends on the local image statistics. Controls are introduced, to improve 
the performances and to adapt it to moving sequences with different characteristics. Both 
spatial and temporal information is also used to enhance details, avoiding blocking artifacts 
and noise. 

2, 3-D unsharp masking technique for video image sequences 

The aim of a contrast enhancement technique is often to emphasize the details 
of a scene so as to make them more visible to a human viewer. Basically, the enhancement is 
accomplished by increasing the steepness of the significant luminance transitions, possibly 
also adding an overshoot to the edges around objects in the image. 

In this chapter, we present an Unsharp Masking-based approach for edge 
enhancing for TV application. The proposed scheme enhances the true details, limits the 
overshoot near sharp edges, and attenuates the temporal artifacts. More precisely, to try and 
cope with the problems of linear imsharp masking (noise sensitivity and excessive overshoot 
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on sharp details) we have extended this technique with an adaptive control that uses the local 
spatio-temporal image content. 

The block diagram of the proposed algorithm is depicted in Fig. 3. Let s(n, m, 
t+1) be the input signal; the enhanced signal u(n, m, t) results from the contribution of three 
5 terms: 

u(n, m, t) = s(n, m, t) +X.Us(n, m, t) - Ut (n, m, t) (2. 1) 

where Us(n, m, t) is the output signal of a spatial non-linear high-pass filter SHPF that 
introduces an emphasis which makes the signal variations more sharp, without incurring 
significant noise amplification and temporal artifacts. Ut(n, m, t) is the output of a temporal 

10 high-pass filter THPF operating on three frames s(n, m, t+1), s(n, m, t), s(n, m, t-1), that 

suppresses noise meanwhile preserving the sharpness of moving objects. Delay lines Z*' are 
present to generate s(n, m, t) and s(n, m, t-1) from s(n, m, t+1). Multiplier Ml multipHes the 
output signal Us(n, m, t) of a spatial non-linear high-pass filter SHPF by the factor X.. Adder Al 
adds the output signal of the multiplier Ml to the once-delayed input signal s(n, m, t). The 

1 5 output signal u(n, m, t) is displayed on a display device D. 

The effect of the spatial and the temporal control fiinctions on the image quality 
is explained in Sec.2.1 and Sec.2.1.2 respectively. 

2.1 Spatial operator 

20 The proposed algorithm is fomied by a set of directional high-pass filters, 

followed by nonlinear correlation filters. In particular, we consider the separate effects of the 
control signal along the horizontal and the vertical direction: this choice, apart from being 
simple, is also justified by the fact that the eye is more sensitive to the lines and edges having 
these orientations [1], 

25 The block diagram of the spatial high-pass filter SHPF is depicted in Fig. 4, and 

its expression is: 

Us(n, m, t) = Zx(n, m, t)Cx(n, m, t) +Zy(n, m, t)Cy(n, m, t) (2:2) 
where 

Zx(n, m, t) = 2s(n, m, t) - s(n, m-1, t) - s(n, m + 1, t) (2.3) 
30 Zy(n, m, t) = 2s(n, m, t) - s(n-l , m, t) - s(n + l,m, t) 

respectively, are the outputs of respective Laplacian filters (horizontal and vertical linear high- 
pass filters HLHPF and VLHPF) applied horizontally and vertically to the input image. Their 
amplitude response is a monotone function of the frequency and is used to introduce the 
required emphasis on the signal variation. 
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In order to improve the performance of the high-pass filter we need to condition 
its operation so as to emphasize only the medium-contrast details. To achieve this purpose, Zx 

— " and Zy are multiplied by the control fimctions cx and Cy that depend on t^^ 

data. The control functions Cx and Cy are obtained by respective horizontal and vertical control 
5 functions blocks HCF and VCF that each receive the input signal s(n, m, t) of the spatial high- 
pass filter SHPF, Zx and Zy. Multiplier M2 multiplies the output signal Zx(n, m, t) of the 
horizontal linear high-pass filter HLHP by the output Cx(n, m, t) of the horizontal control 
function HCF. Multiplier M3 multiplies the output signal Zy(n, m, t) of the vertical linear high- 
pass filter VLHP by the output Cy(n, m, t) of the vertical control function VCF. Adder A2 sums 

10 the outputs of the multipliers M2, M3 to obtain the output signal Us(n, m, t) of the spatial high- 
pass filter SHPF. 

For the sake of simplicity, in the following we explain the operation of our 
method referring only to the x-direction. 

15 

2.1.1 Horizontal control function 

The control function is a nonlinear control that uses the local image content to 
avoid noise amplification, overshoot near sharp edges and temporal artifacts. To better 
imderstand its behavior, the output of the control function is shown in Fig. 5, which shows the 
20 part of the block diagram enclosed in the dashed box in Fig. 4. 

The control function consists of three major functional blocks: 

• a non-linear rational controlling function RF for the sharpening term; 

• a specialized control TLE for the enhancement of thin lines; 

• an IIR low-pass filtering IIR-LPF of the controlling function. 

25 In the following, the effect of these controls on the image quality will be briefly 

explained. 

Rational function RF 

The controlling function we use is a rational function: 

'1 ifd,<d^<d. 



30 



2 J2 
2 



if d^<d,,d^>d^ CIA) 
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The value this function assumes depends on an edge sensor: 
dx(n, m, t) = s(n, m-1, t) - s(n, m+1, t) (2.5) 

It is a band-pass filter that discriminates between signal and noise. To this 
purpose, two implicit hypotheses are made on the significant image details: first, that they are 
represented by local gradient values which are larger than those introduced by noise; second, 
that they contribute to the data spectrum mainly in its mid-frequency range. Therefore, this 
function tends to privilege high-gradient areas and it is less sensitive to slow signal variations. 
Moreover, this mechanism makes the proposed operator less sensitive to the Gaussian- 
distributed noise that is always present in the data. 

To better imderstand its behavior. Fig. 6 shows the non-linear control term Cx 
as a function of dx for a specific choice of di and d2. The main characteristics of the rational 
function Cx are: 

1- Cx = 1 for di <dx<d2: this characteristic makes the operator able to emphasize 

details that are represented by the low and medium amplitude luminance 
transitions; 

2. Cx for dx 0: in homogenous areas, where the noise is usually more 
visible, impulsive noise yields small value of dx (smaller than di), and then a 
small value of the added high-pass signal; 

3. Cjt 0 for dx 00: in this way, undesired overshoots near sharp edges are 

avoided. In fact, sharp edges will yield high values of dx (larger than d2) and Cx 
will be small. 

We can achieve the best balance among these effects by setting the right positions for d| and 
di. The intensity of the enhancement can be obviously adjusted by changing the value of X.. 

Controlling the enhancement of thin lines TLE 

Even though experimental results support the validity of the rational function 
Cx , it cannot be used to emphasize thin lines. In fact, to reduce the noise sensitivity, the edge 
sensor is a bandpass filter and is not able to detect lines having a thickness of one pixel. 
Therefore, thin lines are erroneously interpreted as noise and this drawback causes some 
artifacts. 

For example, when noisy pixels are adjacent to the line itself, the output of the 
sensor varies along the line. Hence, even for low-amplitude noise, the contribution of the 
correction term CxZx may strongly vary because of the high slope of the control function near 
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the origin combined with the very high value of the Laplacian. Temporal changes of the noise 
samples give rise to a non-uniform enhancement, appearing as an annojdng flicker effect close 
to the line. 

Thin Unes are emphasized in a non-homogeneous way. Moreover, the 
enhancement in the first firame (left) is different firom that one in the second frame (right), and 
introduce a temporal artifact which is very visible and annoying in an image sequence. 
Many small artifacts and non-homogenuities are introduced also near lines having slope close 
to, but different firom, 0 or 90 degrees. In fact, in digital images, these lines unavoidably show 
jagged borders (staircase effect). In particular, one-pixel-wide lines are formed by segments 
whose width alternates between one and two. The rational function, due to its insensitivity to 
very thin lines, enhances these lines in a non-homogeneous way, and the staircase effect is 
therefore emphasized. 

In order to overcome such problems, a simple but effective method is to include 
a proper control to detect thin lines. To this purpose, we exploit the outputs Zx and Zy of the 
Laplacian filters themselves. In correspondence of a very thin line, the following three cases 
are possible: 

horizontal line: IzJ =0 andlzyl »0 (2.6) 

vertical line: 1 zj .» 0 and I Zy! ^0 (2.7) 

diagonal line: I zJ » 0 and I Zyl » 0 (2.8) 

From these relationships, it is straightforward to discriminate between a pixel 
belonging to a line and a noisy pixel, simply by setting a threshold Sn which can be derived 
from statistical considerations, as it will be shown in the following. For example, when 

1 Zxl > Sn and I Zyl < Sn, 
a vertical line is detected and it is only needed to limit the overshoot when such a line is 
already well visible. Therefore, once a thin line is detected by thin line detector TLD, the 
rational function is replaced by following overpeaking controlling function OCF (see Fig. 
5) by means of a switch S that is controlled by the output signal of thin line detector TLD: 



Cx 



1 ifV.\{d, 



which simply limits the correction temi Zx when its value is larger than di. 

The extension to cases (2.7) and (2.8) is similar. In fact, when the condition 
(2.7) is true, a horizontal line is detected and the vertical rational function Cy is replaced by 



wo 00/42778 11 PCT/EPOO/00351 

Cy . Finally, if condition (2.8) is verified, both and Cy are replaced by and 
respectively. 

Sn is chosen to discriminate between a pixel belonging to a line and a noisy 
pixel. The value of Sn cannot be too small because many noise pixel would be erroneously 
5 interpreted as thin lines; on the other side Sn cannot be too much high, otherwise thin line 
would not be individualized. 

A good solution is to detemiine the relationship between Sn and the noise 
variance present in the image. For this purpose, we estimate the probability of erroneously 
labeling a noise pixel as important detail. Supposing a sequence of images degraded with a 
10 zero mean Gaussian noise having variance cy^, we estimate the probability error when, only 
noise is present. 

Because {s{n + k^mj[k = -l,0,l} {s{n,m -h k)\k = -l,0,l} 

are uncorrelated Gaussian random variables with mean zero and variance cr^ (^(0, cr^ )) , it is 
easy to show that Zx and Zy are Gaussian random variables, with mean zero and variance 
15 al ^al =6cr^ 

Since Zx and Zy depend on the value of s(n, m) (Eq.2.3), they are v.a. correlated. Hence the 
joint probability density function is: 

J_ 



20 



^ ' Z T Z 



<^^y <^zy 



2(1- p') 
where 

cov[z,z^\ 
p= 

Since Z,,Z^ « 7S^(0,6cr) 

E[z^Z^\-E[zMZy\ 4E\s{n,my] _ 4a' _ 2 
6(7^ 6a^ 3' 

and the probability of error is: 
/>,=l-P(lZ,|<5„e|Z,|<5„) 



P 
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By selecting S„ = 3<j„ = 3>/6o- we obtain 
p. ^1-0.995 = 0.005=^ 

This means that in a 720 x 576 image, for example, more than 2000 noise 
5 points are amplified. However, most of these errors will not be perceived thanks to the 
masking effect of the details and of motion. Experimental results have shown that 
5,, = 3a ^ = 3^f6c^ is a satisfying threshold value. 

We have to observe that this method can be applied only if know the noise 
variance g^, but it is not easy to estimate. However, the annoying flicker effects close to the 
1 0 line and staircase effect are visible above all in those parts of a sequence that are almost 
stationary. In this case, is easy to estimate the noise variance [10]. The simplest method to 
estimate is frame averaging, which is very effective when the image does not change from 
frame to frame but the degradation does. The simplest and most common form of frame 
averaging is estimating an image f(n, m) from a sequence of N degraded image frame gi(n, m) 
15 forl</<7V. 

Suppose a sequence of degraded images gi(n, m) can be expressed as 
g,(n,m) = /(n,m) + ;7,(n,m), l<i<N (2.9) 
where ni(n, m) is zero-mean stationary white Gaussian noise with variance of a and ni(n, m) is 
independent from ni(n, m) for i^^j. If we assunie that f(n, m) is nonrandom, the maximum 
20 likelihood (ML) estimate of f(n, m) that maximize 

is given by 

hn.m) = l-Xg,{n,m) ^ (2.10) 
From (2.9) and (2.10), 

25 /(«, m) = /(/I, m) + ^ X (n,m) (2.11) 

From (2.1 1), the degradation in the frame-averaged image remains a zero-mean 
stationary white Gaussian noise with variance of / N , which represents a reduction of 
noise variance by a factor of N in comparison with 7, (w, m) 
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This suggests estimating 7, from gj . The value of N must be high in order 
to obtain an image / («, m) with a small noise variance. It cannot be too much high, because 

the estimate of cr^ would be wrong on sequences that are almost stationary. 

Supposing to apply the enhancement operator on sequences with a maximum 
5 value of =30, we choose N = 5. The estimate of cr^ will be wrong in conditions of fast 
motion, but in this case the artifacts are less visible. 

nR low-pass filtering 

From Fig. 5, it can be noticed that the final value of the horizontal control 
10 function Cx is obtained by a spatial low-pass filtering, for the reasons that will be explained in 
the following. Formally, its expression is: 

(/I, '",0 = 1 k(« - 1, 0+ 'w, 0] (2.12) 

where Cx coincides with c^^or , depending on whether a thin line is detected or not. 

A non-uniform enhancement effect is present in highly detailed areas with 

15 complex and fine contours, still due to small changes in the output of the sensor amplified by 
the slope of the rational function, together with the high value of the Laplacian in such areas. 
This effect is usually not perceived in 2-D enhancement, thanks to the masking effect of the 
details themselves; however, it generates a flickering which is very annoying in those parts of 
a sequence which are almost stationary. The sharpening is effective, but the edges are 

20 amplified in a non-homogeneous way. 

To make the operator more robust against the appearance of false details, Cx is 
IIR filtered as is shown in Eqs.(2.12): the values assumed by on Cx neighboring pixels are 
computed by averaging the current value with the previous one along the direction which is 
orthogonal to the control action. In such a way, it is possible to reduce the discontinuity of 

25 these values mainly along the direction of the local borders, avoiding the generation of false 
details and the related visible artifacts. 

It has to be noted that an order of the low-pass filter more than two may 
generate artifacts because in this case we could mediate pixel that don*t belong to the same 
object. 

30 



2.1.2 Temporal operator 
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In video sequences, temporal correlation can be exploited to further reduce the 
noise. To suppress noise while preserving the sharpness of moving objects, we propose a 
temporal filter working on three frames. The basic assumption is that when a noisy pixel 
occurs the probability it occurs also in the previous or the next frame in the same spatial 
position is very small. 

Let us define the value of s in three subsequent frames as sin, m, r -1) , 

sin, m, t) and sin, w, ^ + 1) . When the following conditions 

m, r - 1) = /M, / + 1) (2.13) 
sin, m, t) - sin, m, / - 1) 0 (2. 1 4) 

are both verified, probably a noisy pixel is present in a locally static area. The proposed 
temporal filter is able to evaluate the conditions (2.13) and (2.14) and at the same time to 
perform low-pass filtering. Such a filtering is achieved by subtracting from the input signal the 
following high-pass signal: 

in, m,t)^^ in, m, t)c, in, m, t), (2. 1 5) 

where Z/ a temporal Laplacian filter: 

z, in, m, t) = Isin, m, t) - sin, m,t-\)- sin, m,t^\), (2. 1 6) 

and Ct controlling function: 



c, in,m,t)^< 



dl -dfin,m,t) ^f-y ( )l < j 

— w|-«o (2.17) 



^0 if\dfi,)\)d. 



Its shape is shown in Fig. 7; it can be seen that >Ofor\d, | < c?o ; therefore the temporal low- 
pass filter attenuates signal variations which are smaller than a certain threshold. This allows 
us to smooth selectively the signal variations which are attributed to noise corruption. 

The temporal sensor d, is chosen to discriminate between a noisy pixel and a 
detail. It must be a simple function which is able to evaluate the conditions (2.13) and (2.14). 
To this purpose, the bandpass filter d, = sin, m, / - 1) - sin, m,t + l) gives acceptable results 
with an extremely low computational complexity. As an example, let us consider the case of 
s(n, m, t) corrupted by the noise, no motion, and presence of an uniform area in the masks, so 
that the spatial part of the filter is not allowed to operate: the values of dx, dy and dt will be 
small, so that ^ c^. = 0 and c, = 1 . 
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Hence from Eq.(2.1) the output of the filter becomes: 

^(/7, Tw, 0 + m, / - 1) + 7n, ^ + 1) 

m(«, w, t) = 

4 

which corresponds to a linear low-pass filter that reduces the noise. On the contrary, in 
correspondence of a local movement )>0 , so that c, ^ 0 and no filtering is performed. 

5 It has to be noted, however, that this version of the filter may generate artifacts 

in particular conditions of fast motions. Let us consider for example the case of an object 
moving quickly on a uniform background: if the object is present in the filter mask at time t, 
but not at times t-l and t +7, the value of s(n, m, t) is considered as impulse noise by the 
temporal part of the filter, and the object may be deleted. In an example, a disk with white and 
10 black slices rotates with such a frequency to satisfy conditions (2.13) and (2.14). Applying the 
temporal operator with the bandpass sensor, some artifacts are introduced, which are visible as 
light and dark thin lines. If a larger computational load can be accepted, this type of artifact is 
eliminated by resorting to a Range Filter [11]. Experimental results show that the noise 
reduction is the same with both operators. Therefore, the expression of dt becomes: 

= max{5(rt, /M, t - k)\k = -l,0,l} 

d^^ =min{5(/2,w,/-A:)|A: = -1,0,1} 

Simulations show that the processed image presents sharper edges which make 
it more pleasant to the human eye. Moreover, the amount of the noise in the image and the 
temporal artifacts are clearly reduced. 

20 

2.2 Computational complexity analysis 

Now, we analyze the computational complexity of our proposed algorithm. 
From Fig. 4 it can be seen that the operator is separable, therefore we consider the separate 
effect of the spatial and temporal operator. The operator needs 19 multiplications and 13 sums. 
25 However, it has to be noted that the values of the non-linear ftinctions , Cy and c, depend on 

the value of the edge sensors (d^.d^.and d^ ) which assume only 255 different values. Hence, 

the rational functions can be calculated by using a lookup table; in this case the number of 
multiplications per pixel becomes 7 instead of 19. Hence, under a computational viewpoint, it 
must be stressed that the proposed solution gives acceptable results with an extremely low 
30 computational complexity 



15 
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3^ Unsharo masking technique for block-coded image sequences 



3.1 Introduction 

Most video data compression standards use a block-based motion estimate and 
block-based discrete cosine transform (DCT). In particular, they use an x N pixel-block 
DCT for information packing into a few transform coefficients. This block-based DCT scheme 
takes advantage of the local spatial correlation property of images. However, this type of 
coding induces the well-known blocking artifacts, comer outliers, and ringing noise when the 
image is highly compressed. 

Conventional 2-D techniques for contrast sharpening cannot be applied to the 
enhancement of block-coded image sequences, because they emphasize such artifacts that 
become very visible and annoying in an image sequence. The local spectra and bandwidth of 
both the noise and the signal vary spatially, and the characteristics of the filters need to be 
locally adapted. 

For this purpose, in this chapter we first present a comprehensive 
characterization of the coding artifacts which are introduced into reconstructed video 
sequences through the use of the video coding algorithm and then propose a simple 
modification to the algorithm described in the previous chapter in order to emphasize the 
details and reduces the visibility of the most typical coding artifacts. 

3.2. Video coding distortions 

The following discussion will be limited to descriptions of the artifacts' visual 
manifestations, causes and relationships. We also describe the spatial and temporal 
characteristics of video sequences that are susceptible to each artifact and in which the 
artifacts are visually prominent. 



3 .2. 1 Blocking effect 

The blocking effect is an annoying side result of techniques in which blocks of 
pixel are treated as single entities and coded separately. In brief, due to the isolated nature in 
which individual blocks are coded, the level and characteristic of the coding error introduced 
into a block may differ firom one block to another. So, a smooth change of luminance across a 
border can result in a step in the decoded image. 

When we consider a smoothing operation to remove blocking artifacts in video 
coding, we find three interesting observations. First, the human visual system is more sensitive 



wo 00/42778 1 7 PCT/EPOO/00351 

to blocking artifacts in at regions than in complex regions. Therefore, a strong smoothing filter 
is required on those regions. In complex regions, however, smoothing of few pixel around the 
block boimdary is enough to achieve a desired de-blocking effect. Second, smoothing 
operations tend to introduce more undesirable blur in complex regions that in at regions. 
Hence, adaptive smoothing to preserve image details is desirable in complex regions. Third, 
because of motion compensation, blocking artifacts are propagated, and the propagated 
artifacts are more visible in at regions that in complex ones. Therefore, smoothing in that 
regions must cover the inside of a block as well as block boundaries. 

3.2.2 Blurring 

Blurring manifests as a loss of spatial detail and a reduction in sharpness of 
edges in moderate to high spatial activity regions of frames, such as in roughly textured areas 
or around scene object edges. 

For intra-frame coded macroblocks, blurring is directly related to the 
suppressions of the higher-order DCT coefficients through coarse quantization, leaving only 
the lower-order coefficients to represent the contents of a block; therefore, blurring can be 
directly associated with low-pass filtering. 

For predictive coded macroblocks, blurring is mainly a consequence of the use 
of a predicted macroblock with a lack of spatial detail. However, blurring can also be induced 
in bi-directionally predicted macroblocks, where the interpolation of the backward and 
forward predictions results in an averaging of the contents of the final bi-directional 
prediction. In both these cases, the blurred details are supplemented by the prediction error 
that supplies some higher-frequency information to the reconstraction, thereby reducing the 
blurring effect. 

From the observations above, an adaptive enhancement filter is required in 
complex regions in order to emphasize the details and reduce the visibility of the coding 
artifacts. 

3.2.3 Staircase effect 

The DCT basic images are not attuned to the representation of diagonal edges. 
Consequently, more of the higher activity basis images are required to satisfactory represent 
diagonal edges. Due to the typically low magnitude of the higher-order basic images, coarse 
quantization results in their truncation to zero. The contribution originally made by the higher- 
order basic images in forming the diagonal edge is diminished, resulting in a reconstruction 
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exhibiting only the characteristics of the lower frequency basic images, which are generally 
either horizontally on vertically oriented. So, for a block containing a diagonal edge angled 
towards the horizontal, coarse quantization will result in a reconsmiction with horizontal 
orientation, and vice versa for blocks angled towards the vertical. Note that the step-wise 
discontinuities occur at block boundaries. 

3.2.4 . Ringing 

The ringing effect is related to the high-frequency distortions introduced by the 
DCT coefficient quantization. In fact the representation of a block can be considered as a 
carefully balanced aggregate of each of the DCT basic images. Therefore, quantization of an 
individual coefficient results in the generation of an error in the contribution made by the 
corresponding basis image to a reconstructed block. Since the higher-frequency basic images 
play a significant role in the representation of an edge, the quantized reconstruction of the 
block will include high-frequency irregularities. This effect is most evident along high contrast 
edges in areas of generally smooth texture in the reconstruction. 

3.2.5 False edges 

Because of motion compensation, blocking artifacts are propagated; hence, 
false edges are a consequence of the transfer on the block-edge discontinuities formed by the 
blocking effect into the current frame. As with the blocking effect, false edges are mainly 
visible in smooth areas of predictive coded frames. The prediction error in such areas would 
typically be minimal, or quantized to zero, and therefore the false edges would not be masked. 

3.2.6 Motion-compensation mismatch 

Motion-compensation (MC) mismatch can be generally defined as the situation 
in which a satisfactory prediction cannot be found for a particular macroblock, resulting in a 
prediction whose spatial characteristics is mismatched with those of the current macroblock. 
The consequence of such a situation is a high level of prediction error which result in a 
reconstruction with highly visible distortions that is a typically a high-frequency noise-like 
effect. 

3.3 Proposed algorithm 

Many de-blocking schemes have been proposed in still image coding such as 
JPEG under the assumption that blocking artifacts are always located at block boundaries. In 
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video coding, however, the blocking artifacts of the previous frame can be propagated to the 
current frame, and can be located at any position in a block due to motion-compensated 
prediction (Sec.3.2.5), Therefore, simple block boundary smoothing is not good enough to 
remove blocking artifacts appearing in video coding. Moreover, a low-pass effect is also 
5 needed inside the macroblock to reduce ringing and motion compensation mismatch effects. 
Iterative methods based on projection on a convex set [12, 13] may be a candidate algorithm. 
However, it is not adequate for real-time video coding due to its complexity. 

In this section, we present an unsharp masking-based approach for noise 
smoothing and edge enhancing. The block diagram is similar to the one shown in Fig. 3 and 
10 Fig. 4, but it is necessary to modify the rational fimctions (Eq.2.4) in order to enhance true 
details and reduce coding artifacts. 

The new rational function and ( c^^ ) has to be simple and must obey several 

restrictions, in order to enhances true details and reduce noise: 

1 . It has to attenuate signal variations which are smaller that a certain threshold h. This will 
15 allow us to smooth selectively the signal variations which are attributed to coding artifacts; 

2. The mid-range details which are the signal transitions having magnitude larger than h must 
be enhanced; 

3. To avoid the excessive overshoot over sharp edges: 

20 A simple function verifying all these conditions can be formulated as a rational 

function that is the ratio of two polynomial functions: 



1 otherwise 



d (n,m,ty -h' d (n,m,ty -h' 

if -r~. — -— ^1 



kdy{m.,jy +4AA' kdy{n,mjY +4Ah' 

1 otherwise 



where the edge sensors d^ and J_^are as: 
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in, m, t) = (n, m, t) — r- 



dAn,m,t) = dAn,m,t) 



-(3.3)_ 



cr 



Variance-based measurements proved to be a reliable and simple estimator of the degree of 
local activity [14] and have been used for this purpose also in the present v^^ork. Hence 
2 is the variance of the pixels belonging to a 3x3 window centered on the current pixel, and 
cr,^ is a value to be fixed based on the average variance of the image. The parameters k and h ' 
are proper positive factors that are chosen to achieve the best trade-off between enhancement 
of details having low and medium sharpness, reduction of noise and overshoot near already 
sharp edges. The relations between the parameters di, d2, k, h 



1 ^ 



k=— -r, h 



2 J2 

2 



1 0 while with the parameter X we can adjust the enhancement intensity. 



Fig. 8 shows the shape of the horizontal rational control function. It can be seen 
that < 0 for d^<h'\ therefore it attenuates signal variations that are smaller than a certain 
threshold. To better understand the behavior of the filter, we describe the main characteristics 
in at and complex regions. 

15 

1. In uniform areas, is negligible with respect to cr,^ , so that d^ and dy tend to 

zero and c^^ = c^^ < 0 . Hence the output of the spatial filter s + ^[z^c^f, + z^c^^ Jis a low-pass 

signal and the coding artifacts are reduced. 

In particular, supposing that a vertical border occurs between uniform areas 
20 (blocking effect; see Fig. 9); will be small and c^^ = - . Since we consider a vertical 
block boundary VB, the previous value of the control function Cx (n-l,m) is similar to . 
'^xb • Then the output of the IIR filter coincides with c^^ (n, m) : 

Cx(n,m) = I [c, (rt - 1, + c^^ («, m)] = c,^ (n, w) = - ^^^^ • 

Moreover, considering that on such a border Zy will be small too, the output of 
25 the filter only depends on the horizontal control function, and we obtain a horizontal linear 
low-pass filter: 
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^ 2^(n, Tw) -f s(n - 1, m) + s(n + 1, m) 
4 

that reduces the discontinuities between the boundaries of adjacent blocks. 

Our operator also reduces ringing noise. Ringing noise along a vertical edge is 
alike that considered above; in fact the value of Zy is small and the output of the filter only 
depends on the horizontal control function. So, we obtain a horizontal low-pass filter. 

As regards ringing noise around a diagonal edge, the value of Zx and Zy are not 
zero and the output of the filter depends on both the horizontal and vertical control functions: 
Us(n, m) =s(n, m) +Mzx(n, m)Cx{n, m) +Zy(n, m)Cy(n, m)] 

Since is negligible with respect to the values of the rational functions are: 

It has to be noted that the previous values of the control functions are greater 
than the current value: 

(w - 1, w) > c^^ (rt, m) and («, m - 1) > c^^ (w, m) 

because they are closer to the edge. This means that the output of the IIR low-pass filter is 
greater than -~ . Experimental results show that 

Cx(n,m) =^[c^(n-l,m) + c^^(n,m)]s 
Cy(n,m) = j; [Cy (w, m - 1) + c^^ (n, m)]^~^ 

In this way, we obtain a 2-D linear low-pass filter; in fact the expression of Us(n, m) is: 

^ 2s(n, m) + s{n + 1, w) -h s{n - 1, w) + m + 1) 
w^(/2,/n)= 

s(n — m — X) 

+ — 

6 

2. In detail areas, a^>0, and then d^ =d^, dy s dy . Therefore, the mid-range 

details represented by the signal transitions having magnitude larger than h' will be enhanced. 
On the other hand, when dx = 0 {or dy = 0) and » 0 the area is uniform in the horizontal 

(vertical) direction, so that the low-pass filtering is applied. 
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If a smaller computational load is expected, we can calculate the variance of 
five pixels instead of nine pixels. So, if we define a; like the variance of the following five 
pixels: s(n, m); s(n-l, m); s(n+l, m); s(n, m-1); s(n, m+1), we can use a; in Eq.3.3. 

3.4 Conclusions 

From the observations above, the filter for at regions provides a smoothing 
effect inside a block as well as on block boundaries because in video coding the blocking and 
ringing artifacts can be located at any position in a block due to motion-compensated 
prediction. 

On the other hand, the behavior of the filter is a nonlinear high-pass one in 
complex areas in order to enhances true details. The variance detector that regulates the filter 
behavior is embedded in the filter itself, which allows a compact expression of the operator. 

It has to be noted that the operator for the enhancement in block-coded image 
sequences is similar to the one described in chapter 2. This suggests defining an enhancement 
algorithm for both video and block-coded image sequences. For this purpose, a simple and 
effective solution is proposed in chapter 4. 

4^ Automatic discrimination between video and block-coding image sequences 

In chapter 2 we have described an enhancement technique for video image 
sequences that enhances the true details, limits the overshoot near sharp edges and attenuates 
the temporal artifacts. This operator cannot be applied to the enhancement of block-coded 
image sequences, because it emphasizes such coding artifacts that become very visible and 
annoying. 

For this task, a simple modification has been introduced when processing 
block-coded image sequences in order to emphasize the details and reduce the visibility of 
blocking and ringing effects (see chapter 3). 

However, this operator cannot be applied in video sequences, because it 
introduces significant blurring and does not amplify the low-contrast details. 

In order to define an enhancement technique for both video and coded image 
sequences, it is important to insert in the operator a mechanism that allows the behavior of the 
filter to change according to the presence of the coding artifacts. To this purpose, in this 



wo 00/42778 23 PCT/EPOO/00351 

chapter we insert a weight factor in the expression of edge sensor that depends on a new 
distortion measure for blocking and ringing artifacts. 



4. 1 Enhancement operator for both video and coded image 

From a comparison of the enhancement techniques described in chapter 2 and 3 
it can be seen that they defer for the rational function and the edge sensor. This suggests 
defining a rational function and/or an edge sensor for both video and block-coded image 
sequences. A simple and effective solution is the following: 



where 



dl (17, m) - A' d'An.m)-h^ ^ 



kd'^ {n, m) + 4^/2' kd^ (w, m) + AAh' (4.1) 
1 otherwise 



d, (77, m) - d, (n, m) — — (4.2) 

(J +>ffcr,^ 

where the parameter p plays an important role in determining the bias of the filter (the 
criterion for the determination of this weight will be explained in the following), hi particular, 
the value of p change according to the presence of the coding artifacts, i.e. p tends to zero in 
uncompressed video sequences, while increasing in block-coded sequences. In this way, the 
behavior of the filter becomes more and more low-pass as coding artifacts increases. 

hi video sequences, the constant - K in the numerator of c^^ introduces a low- 
pass effect aiming to reduce noise that is always present in a real image. Moreover, it has to be 
noted that in this case d^-d^^ therefore the filter does not introduce significant blurring near 

low-contrast details. As mentioned above, the value of P depends on the presence of coding 
artifacts; therefore in the following we introduce a new distortion measure for blocking and 
ringing artifacts. 

4.1.1 Distortion measure for blocking and ringing artifacts 

In order to define a metric for blockiness, various methods have been proposed 
in literature [15, 16, 17, 18]. All techniques described suppose that the blocking artifacts are 
always located at block boundaries, but in video coding the blocking artifacts can be located at 
any position in a block due to motion-compensated prediction. Therefore, these methods are 
not good enough to measure the presence of blocking artifacts. A simple solution would be to 
apply these algorithms with different value of N; however this approach is not adequate for 
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real-time video coding due to its complexity. Moreover, these techniques do not take into 
accoimt the ringing noise that is very visible and annoying in a image sequences. 
We propose a method to evaluate the presence of both blocking and ringing artifacts that does 
not need infonnation about the position and size of a block. 



Detection of the blocking effect 

Among the most typical artifacts that can be introduced by coding algorithms, 
the blocking effect is an annoying side result of techniques in which blocks of pixels are 
treated as single entities and coded separately (see Sec.3.3.1). In this case, a smooth change of 
luminance across the border of a block can result in a step in the decoded image if neighboring 
samples fall into different quantization intervals. 

Then, if the coefficients of two adjacent blocks are coarsely quantized, we 
expect to see two blocks of low variance and a difference in the intensity slope across the 
block boimdary. On the other hand, this abmpt change in intensity slope across the block 
boundaries of the original unquantized image is rather unlikely. 

From the above consideration, it is clear that an appropriate method to detect a 
blocking effect is to find two adjacent blocks with a very small variance value and a different 
mean value. 

For the sake of simplicity, we consider the horizontal direction. Let s(n, m) be 
the luminance level of the pixel in position (n, m); we define //,^ , crf^ , /^2, al^ as the mean 
and variance of two 3x3 adjacent windows: 

E::-.Z^:-.^("-^^"'-^^) (4.4) 



< = 



a; = -— (4.5) 

9 



When the following conditions: 

(4.6) 



are both verified, probably a blocking effect is present in a locally static area. 
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Detection of ringing noise 

As mentioned in Sec. 3.2.4, the ringing noise is the Gibb's phenomenon due to 
5 truncation of the high-frequency coefficients by quantization, and, as such, it is most evident 
along high contrast edges in areas generally of smooth texture. It appears as a shinraiering or 
rippling outwards from the edge up to the encompassing block*s boundary. 

From this consideration, a simple method in order to detect ringing noise is to 
find two adjacent 3x3 windows with a similar mean value (because the ringing noise has 
1 0 mean zero), but a very different variance value. As done earlier, we consider only the 

horizontal direction; let be , , //j, mean and variance of two adjacent windows 

(see Eq.4.4). When 



cr' =0 



0-1 =0 

al » 0 (4.7) 



are verified, probably a ringing noise is detected. The ringing noise is most visible as the bit- 
15 rate decrease. 

4.1.2 Determination of fi value 

In order to define an enhancement technique for both video and block-coded 
image sequences, we have inserted in the operator a mechanism that allows the behavior of the 
20 filter to change according to the presence of the coding artifacts. 

Equation 4.2 describes the approach that has been chosen in order to achieve 
this goal, where takes a zero value in uncompressed images, while increasing in block- 
coded images. In this way, the behavior of the filter ranges from a quasi-linear low-pass one in 
uniform area, to a strongly nonlinear high-pass one in detailed areas. Moreover, it becomes 
25 more and more low-pass as coding artifacts increases. 

From the consideration in the previous section it is clear that depends on the 
amount of the coding artifacts in the sequence. To this purpose, we will now show the relation 
between and the number of coding artifact detected, which take into account the size of the 
image and the number of the frame used for estimate ft , 
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For the sake of simplicity, we define C/Mvideo, UM^iock and t/A/adapt the methods 
described in chapter 2, 3 and the preceding part of chapter 4 respectively. In order to detect 
blocking and ringing artifacts, the conditions (4.6) and (4.7) have to be verified. Experimental 
results show that the following thresholds 

h^i, ''■''^2. ^"^ for detection of blocking effect, and 
[I i" -Ml. \> 5 

{cxl-al \> 25 

+ ^2, = \ ~ ^2, I for detection of ringing noise 
\M ^,-M2. l< 3 
are satisfactory. 

In our experiments, we used image sequences after MPEG coding of the 576 x 
720 frames at various bit-rate. Since the blocking effect is more visible in I frames than in B 
10 and P frames due to the motion compensation, the results take into account different frames. 

Nbi and Nri indicate how many pixels are interpreted as blocking effect and 
ringing noise respectively. Nbi = Nn = 0 for uncompressed video sequences, while it increases 
in block-coded sequences as the bit-rate decreases. It has to be noted that, in some sequences 
the ringing noise is more visible than the blocking effect. 
15 From the above considerations, it is clear that p depends on the values of Nb\ 

and Nri. Obviously, the number of coding artifacts also depends on the size of the image and 
the number of the frame considered; hence let be Nc and Nr the width and the height of their 
image and Nf the number of frame used to calculate Nbi and Nn, an appropriate measure for p 
is: 

Nbi + Nri /<- i x 

^ NcNrNf 

where a is a constant weight which can be derived from experimental results. In our 
experiment, Nc = 720, Nr = 576, Nf = 4 and a = 3000. 

Our method detects both blocking and ringing artifacts and it does not need any 
25 information about the position and the size of the blocks. Moreover p always take a zero value 
on uncompressed image; therefore is very simple distinguish a coded image between a video 
one. 

In order to fiirther test the noise robustness of the proposed method, an original 
sequence has been corrupted using Gaussian-distributed noise of variance 50. If a prior art 
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linear UM filtering (X = 0.5) is applied, the image is well sharpened, but the noise is amplified 
and it becomes so relevant to degrade of the output scene. The best visual quality is obtained 
using the proposed C/Mvideo filtering structure (k = 1.2, di = 30 and dz = 40); in fact the spatial 
operator (see Sec.2.1) enhances the medium-contrast details and the temporal filter (see 
5 Sec.2. 1 .2) reduces the noise. 

A comparison of the enhancement between t/Mvideo. UM\y\ock and ?7Afadapt 
methods, yields the following results. Using the 17Mvideo operator, with X, = 1, di= 30 and = 
40, the processed image is sharp and the noise is almost absent. The f/Mbiock method (X- 1, di 
= 30, d2= 40, Qih^ = 400) show an improvement of the quality inhomogeneous area; 

10 nevertheless, it introduces significant bliuring. In fact, details of intermediate strength are not 
as well defined as in the previous case. Finally, applying the f/Madapt method to the original 
image, it can be seen that the processed image is similar to that of UMvidco- In fact in this 
sequence Nbi == Nri = 0 and yff = 0. Therefore the low-pass effect of the filter is reduced and the 
low-contrast details are amplified. This means that C/Madapt can be applied on both the video 

1 5 and compressed images. 

The results on compressed image sequences are shown in the following. 
Applying a conventional linear i/M operator (X = 0.5) to the coded and decoded image, the 
processed image is sharp, but the ringing effect is very visible in the uniform areas. 
The image restored by t/Mbiock operator (X, = 1, di = 30, d2 = 40, Oth^ = 400) is free of ringing 

20 effect and the edges are enhanced. The image enhanced by UM block method is free of blocking 
artifacts. There is practically no blurring of the texture. Further, edges are amplified and the 
image is more sharp. Obviously, the same results are obtained with t/Madapt operator, because 
in compressed images P » 0, However, its effectiveness depends on the value of p. When p - 
0, X, = 1, di = 30, di = 40 and Oth^ = 200, as expected, the sharpening is effective, especially for 

25 medium-contrast details. However, the coding artifacts are amplified in smooth areas. 

Using p = 1, the homogeneous areas are less noisy than in the case of P = 0. Nevertheless, a 
good sharpening is achieved in the detail areas. Finally, with p - 4, it can be seen that coding 
artifacts are removed, but it introduces significant blurring. 

From these considerations, it is clear that p cannot be a constant value, but it 

30 has to depend from the presence and the amount of the coding artifacts in the images. Equation 
5.1 describes the approach that has been chosen in order to achieve this goal. 

4.1.3 Observations 
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1. A major proposed algorithms to define a metric for blockiness suppose that the 
blocking artifacts are always located at block boxmdaries, but in video coding the blocking 
artifacts can be located at any position due to the motion-compensated prediction. Our 
distortion measure does not need any infiDrmation about the position and the size of the blocks 
and it also detects the ringing noise. 

2. Under a computational viewpoint, it must be stressed that the proposed solution 
maintains almost the same simplicity as the method described in the chapter 3 for the 
enhancement on compressed sequences. In fact, in order to detect blocking and ringing 
artifacts we use the variance value which has already been estimated in the expression of the 
rational function (see Eq*3.3). 

3. If our method is applied to the I frames it is possible to determine the size of a 
block; in fact in these frames the blocking and ringing artifacts are always located at block 
boundaries. A simple method is described in the following, where for the sake of simplicity, 
we consider only the horizontal direction. First, we define a vector posx of length Nc, where Nc 
is the width of the image. When in a block-coded image BCI, a blocking or ringing artifact A 
is detected, the respective position of poSx is set to 1. In this way, a binary vector poSx is 
generated where a 1 value indicates the position of a coding artifact (see Fig. 10). Since the 
blocking and ringing artifacts A are located at block boundaries BB, poSx can be expressed as: 

Nc/B 

poSx(n)~2I S(n-kB) 

lf=l 

where B is the size of the blocks. Then, the autocorrelation of poSx: 

Nc 

Rx (n) = X POS" W • POS" (^■^"> ^"^-^^ 

1=1 

is maximum when n = kB. Therefore, in order to determine the size of the block we can 
simply calculate the position of the first maximum of Rx and Ry. 

As an example. Fig. 1 1 shows the result on a given coded scene after coding at 
1,5 Mb/s. It has to be noted that Rx(n) and Ry(n) take a zero value for n k8, therefore the 
image is segmented into 8x8 size blocks. 



4. Our method cannot be applied in graphical image because an abrupt change in 

intensity slope is rather likely; hence many sharp edges would be erroneously interpreted as 
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blocking effect. Moreover, in these images the variance in uniform areas takes a zero value; 
hence the conditions (4.6) is verified for many pixels. 

In order to distinguish a coded image between a graphical one, we observe that 
in the latter the sharp edges can be located at any position; therefore the autocorrelation of 
5 posx and poSy takes a high value also for n = 1 (see Eq.4.8): 

Rx(i) = S POSx(i) • poSx(i+l) 

Ry(l) = 2 POSy(i) • poSy(i+l) 

From the above consideration, it is simple discriminate the different type of images; in fact the 
following three cases are possible: 
10 (a) Nbi + Nri » Rx(l) + Ry(l) n 0- image is classified as a block coded image; 

(b) Nbi + Nri » 0 and Rx(l) Ry (1) » 0: the image is classified as a graphical image; 

(c) Nbi + Nri 0: the image is classified as a real uncompressed image. 

5^ Conclusions 

15 

Li this description, we have proposed a sharpness enhancement technique for 
video applications, taking into account the fact that the images can be blocked. 

Conventional 2-D techniques for contrast sharpening cannot be applied to the 
enhancement of video image sequences. In fact, they introduce small artifacts and non- 
20 homogenuities which are perfectly acceptable in a still picture but become very visible and 
annoying in an image sequence; moreover, the visibility of a defect strongly depends on the 
sequence contents, namely on the amount of details and of motion. 

A 3-D UM algorithm has been proposed in chapter 2 in order to overcome such 
problems; despite its simple formulation, it is able to produce a good noise-insensitive 
25 sharpening action without introducing unpleasant overshoot and temporal artifacts. However, 
this operator cannot be applied to the enhancement of block-coded image sequences, because 
it emphasizes such coding artifacts that become very visible and annoying. For this task, a 
simple modification has been introduced when processing block-coded image sequences in 
order to emphasize the details and reduce the visibility of blocking and ringing effects (see 
30 chapter 3). 
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Hence, in coded image sequences, the behavior of the filter ranges from a quasi- 
linear low-pass one in uniform areas, to a strongly nonlinear high-pass one in detailed areas. 
The variance detector that regulates the filter behavior is embedded in the filter itself, which 
allows a compact expression of the operator. 

Finally, in order to define an enhancement technique for both the video and 
coded image sequences, a mechanism that allows the behavior of the filter to change according 
to the presence of the coding artifacts, has been inserted in the operator. To this purpose, a 
new distortion measure was introduced which does not need any information about the 
position and the size of the blocks. In fact, the major proposed algorithms to define a metric 
for blockiness suppose that the blocking artifacts are always located at block boimdaries, but 
in video coding the blocking artifacts can be located at any position due to the motion- 
compensated prediction. The algorithm has been evaluated by computer simulations and its 
results clearly show that it outperforms the existing algorithms. A specific component of the 
operator allows this technique to be employed for decoded sequences in a PC environment, 
also thanks to its low cost in terms of hardware and/or required processing power. The 
algorithm has proven to be robust against the heterogeneous contents of real sequences, giving 
very good performance even on very demanding images. 

It should be noted that the above-mentioned embodiments illustrate rather than 
limit the invention, and that those skilled in the art will be able to design many alternative 
embodiments without departing from the scope of the appended claims. In the claims, any 
reference signs placed between parentheses shall not be constmed as limiting the claim. The 
word "comprising" does not exclude the presence of elements or steps other than those listed 
in a claim. The word "a" or "an" preceding an element does not exclude the presence of a 
plurality of such elements. The invention can be implemented by means of hardware 
comprising several distinct elements, and by means of a suitably programmed computer. In the 
device claim enumerating several means, several of these means can be embodied by one and 
the same item of hardware. The mere fact that certain measures are recited in mutually 
different dependent claims does not indicate that a combination of these measures cannot be 
used to advantage. 
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1 . A method of sharpness enhancement, the method comprising the steps of: 
filtering (SHPF) an input signal (s) to obtain a filtered signal (Zx, Zy); 
multiplying (Ml, M2, M3) the filtered signal by a controllable firaction (X^ Cx, 

Cy) to obtain a multiplied signal; and 

adding (Al, A2) the multiplied signal to the input signal (s); 

wherein the controllable firaction (Cx, Cy) depends on a non-linear function 

(HCF, VCF). 

2. A method as claimed in claim 1, wherein the controllable fi-action (cx) depends 
on an edge sensing function (RF). 

3. A method as claimed in claim 2, wherein the controllable fraction (Cx) is smaller 
than its maximum value for edges smaller than a first threshold value (dl). 

4. A method as claimed in claim 2, wherein the controllable fraction (Cx) is smaller 
than its maximum value for edges exceeding a second threshold value (d2). 

5. A method as claimed in claim 1, wherein the non-linear function depends on a 
thin line detection (TLD). 

6. A method as claimed in claim 5, wherein the thin line detection (TLD) is 
carried out on the filtered signal (Zx, Zy). 

7. A method as claimed in claim 1, wherein the filtering is a spatial filtering 
(SHPF) and the method comprises the further steps of: 

temporally filtering (THPF) the input signal (s) to obtain a temporally filtered 
signal (-U{); and 

adding (Al) the temporally filtered signal (-Ut) to the input signal (s). 
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8. A method as claimed in claim 2, wherein the edge sensing function (RF) is 
variance-based. 

9. A method as claimed in claim 8, wherein the controllable fraction (Cx, Cy) is 
negative for signal variations than are smaller than a certain threshold. 

10. A method as claimed in claim 1, wherein the method further comprises a 
blocking and/or ringing effect detection including a comparison of mean values of adjacent 
pixel windows. 

11. A method as claimed in claim 10, wherein the blocking and/or ringing effect 
detection further includes a comparison of variances of the adjacent pixel windows to a 
threshold value. 

12. A method as claimed in claim 10, wherein the method further comprises a block 
size detection based on an autocorrelation (Rx, Ry) of a vector (posx, posy) indicating blocking 
and/or ringing effects. 

13. A device for sharpness enhancement, the device comprising: 

means for filtering (SHPF) an input signal (s) to obtain a filtered signal (Zx, Zy); 

means for multiplying (Ml, M2, M3) the filtered signal by a controllable 
fraction (X,, Cx, Cy) to obtain a multiplied signal; and 

means for adding (Al, A2) the multiplied signal to the input signal (s); 

wherein the device further comprises means for generating the controllable 
firaction (Cx, Cy) by means of a non-linear function (HCF, VCF). 

14. A television apparatus, comprising: 

a device for sharpness enhancement as claimed in claim 13; and 
means (D) for displaying the sharpened signal. 
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